Analyzing New York City 311 Customer Service from 2016 to Present

In [2]:
#import pandas library aliased as pd 
import pandas as pd
# Initialize df variable as dataframe
df = pd.DataFrame()
print(df)
Empty DataFrame
Columns: []
Index: []

1. Import the dataset stored in 'Customer Service_Requests_from_2010_to_Present.csv'¶

In [4]:
#store csv data into a data frame
df = pd.read_csv("Customer Service_Requests_from_2010_to_Present.csv", low_memory=False)
#print first few data in the dataframe
print(df.head())
   Unique Key            Created Date    Closed Date Agency  \
0    32310363  12/31/2015 11:59:45 PM  01-01-16 0:55   NYPD   
1    32309934  12/31/2015 11:59:44 PM  01-01-16 1:26   NYPD   
2    32309159  12/31/2015 11:59:29 PM  01-01-16 4:51   NYPD   
3    32305098  12/31/2015 11:57:46 PM  01-01-16 7:43   NYPD   
4    32306529  12/31/2015 11:56:58 PM  01-01-16 3:24   NYPD   

                       Agency Name           Complaint Type  \
0  New York City Police Department  Noise - Street/Sidewalk   
1  New York City Police Department         Blocked Driveway   
2  New York City Police Department         Blocked Driveway   
3  New York City Police Department          Illegal Parking   
4  New York City Police Department          Illegal Parking   

                     Descriptor    Location Type  Incident Zip  \
0              Loud Music/Party  Street/Sidewalk       10034.0   
1                     No Access  Street/Sidewalk       11105.0   
2                     No Access  Street/Sidewalk       10458.0   
3  Commercial Overnight Parking  Street/Sidewalk       10461.0   
4              Blocked Sidewalk  Street/Sidewalk       11373.0   

        Incident Address  ... Bridge Highway Name Bridge Highway Direction  \
0    71 VERMILYEA AVENUE  ...                 NaN                      NaN   
1        27-07 23 AVENUE  ...                 NaN                      NaN   
2  2897 VALENTINE AVENUE  ...                 NaN                      NaN   
3    2940 BAISLEY AVENUE  ...                 NaN                      NaN   
4          87-14 57 ROAD  ...                 NaN                      NaN   

  Road Ramp Bridge Highway Segment Garage Lot Name Ferry Direction  \
0       NaN                    NaN             NaN             NaN   
1       NaN                    NaN             NaN             NaN   
2       NaN                    NaN             NaN             NaN   
3       NaN                    NaN             NaN             NaN   
4       NaN                    NaN             NaN             NaN   

  Ferry Terminal Name   Latitude  Longitude  \
0                 NaN  40.865682 -73.923501   
1                 NaN  40.775945 -73.915094   
2                 NaN  40.870325 -73.888525   
3                 NaN  40.835994 -73.828379   
4                 NaN  40.733060 -73.874170   

                                   Location  
0   (40.86568153633767, -73.92350095571744)  
1  (40.775945312321085, -73.91509393898605)  
2  (40.870324522111424, -73.88852464418646)  
3   (40.83599404683083, -73.82837939584206)  
4  (40.733059618956815, -73.87416975810375)  

[5 rows x 53 columns]

2. Provide insight on the information and details that the provided dataset carries.¶

In [6]:
# Information of the dataframe
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 300698 entries, 0 to 300697
Data columns (total 53 columns):
 #   Column                          Non-Null Count   Dtype  
---  ------                          --------------   -----  
 0   Unique Key                      300698 non-null  int64  
 1   Created Date                    300698 non-null  object 
 2   Closed Date                     298534 non-null  object 
 3   Agency                          300698 non-null  object 
 4   Agency Name                     300698 non-null  object 
 5   Complaint Type                  300698 non-null  object 
 6   Descriptor                      294784 non-null  object 
 7   Location Type                   300567 non-null  object 
 8   Incident Zip                    298083 non-null  float64
 9   Incident Address                256288 non-null  object 
 10  Street Name                     256288 non-null  object 
 11  Cross Street 1                  251419 non-null  object 
 12  Cross Street 2                  250919 non-null  object 
 13  Intersection Street 1           43858 non-null   object 
 14  Intersection Street 2           43362 non-null   object 
 15  Address Type                    297883 non-null  object 
 16  City                            298084 non-null  object 
 17  Landmark                        349 non-null     object 
 18  Facility Type                   298527 non-null  object 
 19  Status                          300698 non-null  object 
 20  Due Date                        300695 non-null  object 
 21  Resolution Description          300698 non-null  object 
 22  Resolution Action Updated Date  298511 non-null  object 
 23  Community Board                 300698 non-null  object 
 24  Borough                         300698 non-null  object 
 25  X Coordinate (State Plane)      297158 non-null  float64
 26  Y Coordinate (State Plane)      297158 non-null  float64
 27  Park Facility Name              300698 non-null  object 
 28  Park Borough                    300698 non-null  object 
 29  School Name                     300698 non-null  object 
 30  School Number                   300698 non-null  object 
 31  School Region                   300697 non-null  object 
 32  School Code                     300697 non-null  object 
 33  School Phone Number             300698 non-null  object 
 34  School Address                  300698 non-null  object 
 35  School City                     300698 non-null  object 
 36  School State                    300698 non-null  object 
 37  School Zip                      300697 non-null  object 
 38  School Not Found                300698 non-null  object 
 39  School or Citywide Complaint    0 non-null       float64
 40  Vehicle Type                    0 non-null       float64
 41  Taxi Company Borough            0 non-null       float64
 42  Taxi Pick Up Location           0 non-null       float64
 43  Bridge Highway Name             243 non-null     object 
 44  Bridge Highway Direction        243 non-null     object 
 45  Road Ramp                       213 non-null     object 
 46  Bridge Highway Segment          213 non-null     object 
 47  Garage Lot Name                 0 non-null       float64
 48  Ferry Direction                 1 non-null       object 
 49  Ferry Terminal Name             2 non-null       object 
 50  Latitude                        297158 non-null  float64
 51  Longitude                       297158 non-null  float64
 52  Location                        297158 non-null  object 
dtypes: float64(10), int64(1), object(42)
memory usage: 121.6+ MB
In [7]:
# Dimension of the dataframe
print(f'Dimension: {df.shape}')
Dimension: (300698, 53)
In [8]:
# Description of the dataframe
df.describe()
Out[8]:
Unique Key Incident Zip X Coordinate (State Plane) Y Coordinate (State Plane) School or Citywide Complaint Vehicle Type Taxi Company Borough Taxi Pick Up Location Garage Lot Name Latitude Longitude
count 3.006980e+05 298083.000000 2.971580e+05 297158.000000 0.0 0.0 0.0 0.0 0.0 297158.000000 297158.000000
mean 3.130054e+07 10848.888645 1.004854e+06 203754.534416 NaN NaN NaN NaN NaN 40.725885 -73.925630
std 5.738547e+05 583.182081 2.175338e+04 29880.183529 NaN NaN NaN NaN NaN 0.082012 0.078454
min 3.027948e+07 83.000000 9.133570e+05 121219.000000 NaN NaN NaN NaN NaN 40.499135 -74.254937
25% 3.080118e+07 10310.000000 9.919752e+05 183343.000000 NaN NaN NaN NaN NaN 40.669796 -73.972142
50% 3.130436e+07 11208.000000 1.003158e+06 201110.500000 NaN NaN NaN NaN NaN 40.718661 -73.931781
75% 3.178446e+07 11238.000000 1.018372e+06 224125.250000 NaN NaN NaN NaN NaN 40.781840 -73.876805
max 3.231065e+07 11697.000000 1.067173e+06 271876.000000 NaN NaN NaN NaN NaN 40.912869 -73.700760

3. Convert the columns "Created Date" and "Closed Date" to datetime datatype and create a new column "Request_Closing_Time" as the time elapsed between request creation and request closing.¶

Converting 'Created Date' and 'Closed Date' column to datetime¶

In [11]:
# Print 'Created Date' column data type
print("'Created Date' Data type: " + str(df['Created Date'].dtype))

# Print 'Closed Date' data type
print("'Closed Date' Data type: " + str(df['Closed Date'].dtype))
'Created Date' Data type: object
'Closed Date' Data type: object
In [12]:
#convert Created Date column to date time 
df['Created Date'] = pd.to_datetime(df['Created Date'], dayfirst=True, format="mixed")

#convert Closed Date column to date time 
df['Closed Date'] = pd.to_datetime(df['Closed Date'], dayfirst=True, format="mixed")
In [13]:
# Print 'Created Date' column data type
print("'Created Date' Data type: " + str(df['Created Date'].dtype))

# Print 'Closed Date' data type
print("'Closed Date' Data type: " + str(df['Closed Date'].dtype))
'Created Date' Data type: datetime64[ns]
'Closed Date' Data type: datetime64[ns]

New column creation: 'Request_Closing_Time'¶

In [15]:
# Creating new column Request_Closing_Time 
df['Request_Closing_Time'] = df['Closed Date'] - df['Created Date']
In [16]:
# Output of the changed columns
df[['Created Date', 'Closed Date', 'Request_Closing_Time']]
Out[16]:
Created Date Closed Date Request_Closing_Time
0 2015-12-31 23:59:45 2016-01-01 00:55:00 0 days 00:55:15
1 2015-12-31 23:59:44 2016-01-01 01:26:00 0 days 01:26:16
2 2015-12-31 23:59:29 2016-01-01 04:51:00 0 days 04:51:31
3 2015-12-31 23:57:46 2016-01-01 07:43:00 0 days 07:45:14
4 2015-12-31 23:56:58 2016-01-01 03:24:00 0 days 03:27:02
... ... ... ...
300693 2015-03-29 00:33:41 NaT NaT
300694 2015-03-29 00:33:28 2015-03-29 02:33:59 0 days 02:00:31
300695 2015-03-29 00:33:03 2015-03-29 03:40:20 0 days 03:07:17
300696 2015-03-29 00:33:02 2015-03-29 04:38:35 0 days 04:05:33
300697 2015-03-29 00:33:01 2015-03-29 04:41:50 0 days 04:08:49

300698 rows × 3 columns

4. Drop irrelevant Columns which are stored below in column_names.¶

In [18]:
# Store column names in a variable
column_names = ['Agency Name','Incident Address','Street Name','Cross Street 1','Cross Street 2',
    'Intersection Street 1', 'Intersection Street 2','Address Type','Park Facility Name','Park Borough',
    'School Name', 'School Number','School Region','School Code','School Phone Number','School Address',
    'School City','School State','School Zip','School Not Found','School or Citywide Complaint','Vehicle Type',
    'Taxi Company Borough','Taxi Pick Up Location','Bridge Highway Name','Bridge Highway Direction',
    'Road Ramp','Bridge Highway Segment','Garage Lot Name','Ferry Direction','Ferry Terminal Name','Landmark',
    'X Coordinate (State Plane)','Y Coordinate (State Plane)','Due Date','Resolution Action Updated Date',
    'Community Board','Facility Type','Location']
In [19]:
# Display all columns in df
print("Column Name")
print("_" * 40)
for column in df.columns:
    print(column)
Column Name
________________________________________
Unique Key
Created Date
Closed Date
Agency
Agency Name
Complaint Type
Descriptor
Location Type
Incident Zip
Incident Address
Street Name
Cross Street 1
Cross Street 2
Intersection Street 1
Intersection Street 2
Address Type
City
Landmark
Facility Type
Status
Due Date
Resolution Description
Resolution Action Updated Date
Community Board
Borough
X Coordinate (State Plane)
Y Coordinate (State Plane)
Park Facility Name
Park Borough
School Name
School Number
School Region
School Code
School Phone Number
School Address
School City
School State
School Zip
School Not Found
School or Citywide Complaint
Vehicle Type
Taxi Company Borough
Taxi Pick Up Location
Bridge Highway Name
Bridge Highway Direction
Road Ramp
Bridge Highway Segment
Garage Lot Name
Ferry Direction
Ferry Terminal Name
Latitude
Longitude
Location
Request_Closing_Time
In [20]:
# Display total columns in df
print("Total columns: ", len(df.columns))
Total columns:  54
In [21]:
# Modify the dataframe 'df' by dropping all columns listed in the variable
df.drop(columns=column_names, inplace=True)
In [22]:
# Display all columns in df
print("Column Name")
print("_" * 40)
for column in df.columns:
    print(column)
Column Name
________________________________________
Unique Key
Created Date
Closed Date
Agency
Complaint Type
Descriptor
Location Type
Incident Zip
City
Status
Resolution Description
Borough
Latitude
Longitude
Request_Closing_Time
In [23]:
# Display total columns in df
print("Total columns: ", len(df.columns))
Total columns:  15

5. Remove the NaN missing values from updated dataframe.¶

In [25]:
# Few datas from the last 
df.tail()
Out[25]:
Unique Key Created Date Closed Date Agency Complaint Type Descriptor Location Type Incident Zip City Status Resolution Description Borough Latitude Longitude Request_Closing_Time
300693 30281872 2015-03-29 00:33:41 NaT NYPD Noise - Commercial Loud Music/Party Club/Bar/Restaurant NaN NaN Open Your complaint has been forwarded to the New Y... Unspecified NaN NaN NaT
300694 30281230 2015-03-29 00:33:28 2015-03-29 02:33:59 NYPD Blocked Driveway Partial Access Street/Sidewalk 11418.0 RICHMOND HILL Closed The Police Department responded and upon arriv... QUEENS 40.694077 -73.846087 0 days 02:00:31
300695 30283424 2015-03-29 00:33:03 2015-03-29 03:40:20 NYPD Noise - Commercial Loud Music/Party Club/Bar/Restaurant 11206.0 BROOKLYN Closed The Police Department responded to the complai... BROOKLYN 40.699590 -73.944234 0 days 03:07:17
300696 30280004 2015-03-29 00:33:02 2015-03-29 04:38:35 NYPD Noise - Commercial Loud Music/Party Club/Bar/Restaurant 10461.0 BRONX Closed The Police Department responded to the complai... BRONX 40.837708 -73.834587 0 days 04:05:33
300697 30281825 2015-03-29 00:33:01 2015-03-29 04:41:50 NYPD Noise - Commercial Loud Music/Party Store/Commercial 10036.0 NEW YORK Closed The Police Department responded to the complai... MANHATTAN 40.760583 -73.985922 0 days 04:08:49
In [26]:
# Check missing values
print("Column Name\t\tTotal NaN")
df.isna().sum()
Column Name		Total NaN
Out[26]:
Unique Key                   0
Created Date                 0
Closed Date               2164
Agency                       0
Complaint Type               0
Descriptor                5914
Location Type              131
Incident Zip              2615
City                      2614
Status                       0
Resolution Description       0
Borough                      0
Latitude                  3540
Longitude                 3540
Request_Closing_Time      2164
dtype: int64
In [27]:
# Modifying the dataframe 'df' by dropping all the NaN/NaT value rows
df.dropna(inplace=True)
In [28]:
# Check missing values
print("Column Name\t\tTotal NaN")
df.isna().sum()
Column Name		Total NaN
Out[28]:
Unique Key                0
Created Date              0
Closed Date               0
Agency                    0
Complaint Type            0
Descriptor                0
Location Type             0
Incident Zip              0
City                      0
Status                    0
Resolution Description    0
Borough                   0
Latitude                  0
Longitude                 0
Request_Closing_Time      0
dtype: int64
In [29]:
# Updated data shown from the last rows
df.tail()
Out[29]:
Unique Key Created Date Closed Date Agency Complaint Type Descriptor Location Type Incident Zip City Status Resolution Description Borough Latitude Longitude Request_Closing_Time
300692 30281370 2015-03-29 00:34:32 2015-03-29 01:13:01 NYPD Noise - Commercial Loud Music/Party Store/Commercial 10002.0 NEW YORK Closed The Police Department responded to the complai... MANHATTAN 40.716053 -73.991378 0 days 00:38:29
300694 30281230 2015-03-29 00:33:28 2015-03-29 02:33:59 NYPD Blocked Driveway Partial Access Street/Sidewalk 11418.0 RICHMOND HILL Closed The Police Department responded and upon arriv... QUEENS 40.694077 -73.846087 0 days 02:00:31
300695 30283424 2015-03-29 00:33:03 2015-03-29 03:40:20 NYPD Noise - Commercial Loud Music/Party Club/Bar/Restaurant 11206.0 BROOKLYN Closed The Police Department responded to the complai... BROOKLYN 40.699590 -73.944234 0 days 03:07:17
300696 30280004 2015-03-29 00:33:02 2015-03-29 04:38:35 NYPD Noise - Commercial Loud Music/Party Club/Bar/Restaurant 10461.0 BRONX Closed The Police Department responded to the complai... BRONX 40.837708 -73.834587 0 days 04:05:33
300697 30281825 2015-03-29 00:33:01 2015-03-29 04:41:50 NYPD Noise - Commercial Loud Music/Party Store/Commercial 10036.0 NEW YORK Closed The Police Department responded to the complai... MANHATTAN 40.760583 -73.985922 0 days 04:08:49

6. Show the unique values from all the columns in the dataframe.¶

In [31]:
# Loop through each column in dataframe
for column in df.columns:
    # Print column name, column data, total unique values
    print(f'Column Name: {column}')  
    print(f'Total Unique Values: {df[column].nunique()}')
    print(f'Unique Values: \n{df[column].unique()}')
    print('_' * 100)
Column Name: Unique Key
Total Unique Values: 291107
Unique Values: 
[32310363 32309934 32309159 ... 30283424 30280004 30281825]
____________________________________________________________________________________________________
Column Name: Created Date
Total Unique Values: 251970
Unique Values: 
<DatetimeArray>
['2015-12-31 23:59:45', '2015-12-31 23:59:44', '2015-12-31 23:59:29',
 '2015-12-31 23:57:46', '2015-12-31 23:56:58', '2015-12-31 23:56:30',
 '2015-12-31 23:55:32', '2015-12-31 23:54:05', '2015-12-31 23:53:58',
 '2015-12-31 23:52:58',
 ...
 '2015-03-29 00:42:48', '2015-03-29 00:37:15', '2015-03-29 00:35:28',
 '2015-03-29 00:35:23', '2015-03-29 00:35:04', '2015-03-29 00:34:32',
 '2015-03-29 00:33:28', '2015-03-29 00:33:03', '2015-03-29 00:33:02',
 '2015-03-29 00:33:01']
Length: 251970, dtype: datetime64[ns]
____________________________________________________________________________________________________
Column Name: Closed Date
Total Unique Values: 231991
Unique Values: 
<DatetimeArray>
['2016-01-01 00:55:00', '2016-01-01 01:26:00', '2016-01-01 04:51:00',
 '2016-01-01 07:43:00', '2016-01-01 03:24:00', '2016-01-01 01:50:00',
 '2016-01-01 01:53:00', '2016-01-01 01:42:00', '2016-01-01 08:27:00',
 '2016-01-01 01:17:00',
 ...
 '2015-03-29 00:57:23', '2015-03-29 02:57:41', '2015-03-29 01:02:39',
 '2015-03-29 04:14:27', '2015-03-29 08:41:24', '2015-03-29 02:52:28',
 '2015-03-29 01:13:01', '2015-03-29 02:33:59', '2015-03-29 04:38:35',
 '2015-03-29 04:41:50']
Length: 231991, dtype: datetime64[ns]
____________________________________________________________________________________________________
Column Name: Agency
Total Unique Values: 1
Unique Values: 
['NYPD']
____________________________________________________________________________________________________
Column Name: Complaint Type
Total Unique Values: 15
Unique Values: 
['Noise - Street/Sidewalk' 'Blocked Driveway' 'Illegal Parking'
 'Derelict Vehicle' 'Noise - Commercial' 'Noise - House of Worship'
 'Posting Advertisement' 'Noise - Vehicle' 'Animal Abuse' 'Vending'
 'Traffic' 'Drinking' 'Noise - Park' 'Graffiti' 'Disorderly Youth']
____________________________________________________________________________________________________
Column Name: Descriptor
Total Unique Values: 41
Unique Values: 
['Loud Music/Party' 'No Access' 'Commercial Overnight Parking'
 'Blocked Sidewalk' 'Posted Parking Sign Violation' 'Blocked Hydrant'
 'With License Plate' 'Partial Access' 'Unauthorized Bus Layover'
 'Double Parked Blocking Vehicle' 'Vehicle' 'Loud Talking'
 'Banging/Pounding' 'Car/Truck Music' 'Tortured' 'In Prohibited Area'
 'Double Parked Blocking Traffic' 'Congestion/Gridlock' 'Neglected'
 'Car/Truck Horn' 'In Public' 'Other (complaint details)' 'No Shelter'
 'Truck Route Violation' 'Unlicensed' 'Overnight Commercial Storage'
 'Engine Idling' 'After Hours - Licensed Est' 'Detached Trailer'
 'Underage - Licensed Est' 'Chronic Stoplight Violation' 'Loud Television'
 'Chained' 'Building' 'In Car' 'Police Report Requested'
 'Chronic Speeding' 'Playing in Unsuitable Place' 'Drag Racing'
 'Police Report Not Requested' 'Nuisance/Truant']
____________________________________________________________________________________________________
Column Name: Location Type
Total Unique Values: 14
Unique Values: 
['Street/Sidewalk' 'Club/Bar/Restaurant' 'Store/Commercial'
 'House of Worship' 'Residential Building/House' 'Residential Building'
 'Park/Playground' 'Vacant Lot' 'House and Store' 'Highway' 'Commercial'
 'Roadway Tunnel' 'Subway Station' 'Parking Lot']
____________________________________________________________________________________________________
Column Name: Incident Zip
Total Unique Values: 200
Unique Values: 
[10034. 11105. 10458. 10461. 11373. 11215. 10032. 10457. 11415. 11219.
 11372. 10453. 11208. 11379. 11374. 11412. 11217. 11234. 10026. 10456.
 10030. 10467. 11432. 10031. 11419. 10024. 11201. 11216. 10462. 11385.
 11414. 11213. 11375. 11211. 10312. 10017. 11417. 10002. 10027. 11209.
 10035. 11418. 11421. 11205. 10468. 11355. 11358. 11210. 11368. 11427.
 11436. 10308. 11364. 10011. 11423. 11230. 10003. 11221. 11416. 11378.
 11236. 11218. 10029. 10028. 11214. 11207. 11369. 11223. 11220. 10302.
 11420. 11354. 10473. 10301. 11103. 10465. 11377. 11212. 11365. 10472.
 10452. 11203. 10469. 11237. 11434. 11101. 10460. 11229. 11206. 11102.
 10466. 10009. 10033. 11694. 10022. 10470. 11433. 11428. 11413. 10463.
 10471. 10474. 11228. 10014. 10475. 11225. 11233. 11370. 11204. 11435.
 10459. 11238. 10304. 11367. 10305. 10001. 10314. 10019. 11222. 10023.
 11356. 11235. 10018. 10036. 11106. 10075. 10451. 11366. 10005. 10303.
 10455. 11361. 10309. 10013. 11226. 10012. 11224. 10016. 11249. 10039.
 10128. 10454. 10010. 11360. 11004. 11691. 10025. 10307. 11232. 10038.
 10310. 10040. 11426. 10306. 11362. 11411. 11429. 11422. 10007. 10065.
 10021. 10004. 11104. 11231. 11357. 11239. 11363. 10037. 11693. 10280.
 11430. 10464. 10006. 11692. 10044. 11001. 10282. 11371. 10281. 11109.
 11040.    83. 10020. 10000. 11697. 11251. 10103. 10112. 10069. 11451.
 10153. 10041. 11242. 10119. 10048. 10803. 11695. 10111. 10162. 10123.]
____________________________________________________________________________________________________
Column Name: City
Total Unique Values: 53
Unique Values: 
['NEW YORK' 'ASTORIA' 'BRONX' 'ELMHURST' 'BROOKLYN' 'KEW GARDENS'
 'JACKSON HEIGHTS' 'MIDDLE VILLAGE' 'REGO PARK' 'SAINT ALBANS' 'JAMAICA'
 'SOUTH RICHMOND HILL' 'RIDGEWOOD' 'HOWARD BEACH' 'FOREST HILLS'
 'STATEN ISLAND' 'OZONE PARK' 'RICHMOND HILL' 'WOODHAVEN' 'FLUSHING'
 'CORONA' 'QUEENS VILLAGE' 'OAKLAND GARDENS' 'HOLLIS' 'MASPETH'
 'EAST ELMHURST' 'SOUTH OZONE PARK' 'WOODSIDE' 'FRESH MEADOWS'
 'LONG ISLAND CITY' 'ROCKAWAY PARK' 'SPRINGFIELD GARDENS' 'COLLEGE POINT'
 'BAYSIDE' 'GLEN OAKS' 'FAR ROCKAWAY' 'BELLEROSE' 'LITTLE NECK'
 'CAMBRIA HEIGHTS' 'ROSEDALE' 'SUNNYSIDE' 'WHITESTONE' 'ARVERNE'
 'FLORAL PARK' 'NEW HYDE PARK' 'CENTRAL PARK' 'BREEZY POINT' 'QUEENS'
 'Astoria' 'Long Island City' 'Woodside' 'East Elmhurst' 'Howard Beach']
____________________________________________________________________________________________________
Column Name: Status
Total Unique Values: 1
Unique Values: 
['Closed']
____________________________________________________________________________________________________
Column Name: Resolution Description
Total Unique Values: 12
Unique Values: 
['The Police Department responded and upon arrival those responsible for the condition were gone.'
 'The Police Department responded to the complaint and with the information available observed no evidence of the violation at that time.'
 'The Police Department responded to the complaint and took action to fix the condition.'
 'The Police Department issued a summons in response to the complaint.'
 'The Police Department responded to the complaint and determined that police action was not necessary.'
 'The Police Department reviewed your complaint and provided additional information below.'
 'Your request can not be processed at this time because of insufficient contact information. Please create a new Service Request on NYC.gov and provide more detailed contact information.'
 "This complaint does not fall under the Police Department's jurisdiction."
 'The Police Department responded to the complaint and a report was prepared.'
 'The Police Department responded to the complaint but officers were unable to gain entry into the premises.'
 'The Police Department made an arrest in response to the complaint.'
 "Your complaint has been forwarded to the New York Police Department for a non-emergency response. If the police determine the vehicle is illegally parked, they will ticket the vehicle and then you may either contact a private towing company to remove the vehicle or ask your local precinct to contact 'rotation tow'. Any fees charged for towing will have to be paid by the vehicle owner. 311 will have additional information in 8 hours. Please note your service request number for future reference."]
____________________________________________________________________________________________________
Column Name: Borough
Total Unique Values: 5
Unique Values: 
['MANHATTAN' 'QUEENS' 'BRONX' 'BROOKLYN' 'STATEN ISLAND']
____________________________________________________________________________________________________
Column Name: Latitude
Total Unique Values: 123013
Unique Values: 
[40.86568154 40.77594531 40.87032452 ... 40.77664592 40.70635259
 40.71605291]
____________________________________________________________________________________________________
Column Name: Longitude
Total Unique Values: 123112
Unique Values: 
[-73.92350096 -73.91509394 -73.88852464 ... -73.94880526 -73.87124456
 -73.9913785 ]
____________________________________________________________________________________________________
Column Name: Request_Closing_Time
Total Unique Values: 53225
Unique Values: 
<TimedeltaArray>
['0 days 00:55:15', '0 days 01:26:16', '0 days 04:51:31', '0 days 07:45:14',
 '0 days 03:27:02', '0 days 01:53:30', '0 days 01:57:28', '0 days 01:47:55',
 '0 days 08:33:02', '0 days 01:23:02',
 ...
 '0 days 12:33:42', '0 days 07:28:23', '0 days 05:13:46', '0 days 05:19:11',
 '0 days 10:22:47', '0 days 09:46:41', '0 days 15:40:46', '0 days 04:44:52',
 '0 days 09:44:44', '0 days 15:42:26']
Length: 53225, dtype: timedelta64[ns]
____________________________________________________________________________________________________

7. Show summary statistics of sum, mean, standard deviation, skewness, and kurtosis of the data frame.¶

In [34]:
# Set format as so it displays full value
pd.set_option('display.float_format', '{:,.2f}'.format)

# Request Closing Time in Hours
df['Request_Closing_Time_In_Hours'] = df['Request_Closing_Time'].dt.total_seconds() / 3600

# Calculate sum()
print("Sum Calculation:\nColumn Name\tSum")
df[[
    'Unique Key', 
    'Incident Zip', 
    'Latitude', 
    'Longitude', 
    'Request_Closing_Time_In_Hours'
]].sum()
Sum Calculation:
Column Name	Sum
Out[34]:
Unique Key                      9,112,107,955,295.00
Incident Zip                        3,160,833,212.00
Latitude                               11,855,530.76
Longitude                             -21,520,095.17
Request_Closing_Time_In_Hours           2,242,774.45
dtype: float64
In [36]:
# Calculate mean
print("Mean Calculation:\nColumn Name\tMean")
df[[
    'Unique Key', 
    'Incident Zip', 
    'Latitude', 
    'Longitude', 
    'Request_Closing_Time'
]].mean()
Mean Calculation:
Column Name	Mean
Out[36]:
Unique Key                          31,301,576.24
Incident Zip                            10,857.98
Latitude                                    40.73
Longitude                                  -73.93
Request_Closing_Time    0 days 07:42:15.464980230
dtype: object
In [38]:
# Set format back to default
pd.reset_option('display.float_format')
In [40]:
# Calculate standard deviation
print("Standard Deviation Calculation:\nColumn Name\tStd")
df.std(numeric_only=True)
Standard Deviation Calculation:
Column Name	Std
Out[40]:
Unique Key                       575377.738707
Incident Zip                        580.280774
Latitude                              0.082411
Longitude                             0.078654
Request_Closing_Time_In_Hours       567.208468
dtype: float64
In [42]:
# Calculate Skewness
print("Skewness Calculation:\nColumn Name\tSkewness")
df.skew(numeric_only=True)
Skewness Calculation:
Column Name	Skewness
Out[42]:
Unique Key                       0.016898
Incident Zip                    -2.553956
Latitude                         0.123114
Longitude                       -0.312739
Request_Closing_Time_In_Hours   -8.142360
dtype: float64
In [44]:
# Calculate Kurtosis
print("Kurtosis Calculation:\nColumn Name\tKurtosis")
df.kurt(numeric_only=True)
Kurtosis Calculation:
Column Name	Kurtosis
Out[44]:
Unique Key                       -1.176593
Incident Zip                     37.827777
Latitude                         -0.734818
Longitude                         1.455600
Request_Closing_Time_In_Hours    87.306717
dtype: float64

8. Calculate and show correlation of all variables.¶

In [47]:
# importing seaborn and matplotlib
import seaborn as sns
import matplotlib.pyplot as plt
In [48]:
# Filter relevant numeric columns
correlation_df = df[[
    'Unique Key', 
    'Incident Zip', 
    'Latitude', 
    'Longitude', 
    'Request_Closing_Time'
]]
In [49]:
# Compute correlation matrix
corr_matrix = correlation_df.corr()
corr_matrix
Out[49]:
Unique Key Incident Zip Latitude Longitude Request_Closing_Time
Unique Key 1.000000 0.025492 -0.032613 -0.008621 0.004666
Incident Zip 0.025492 1.000000 -0.499081 0.385934 -0.002069
Latitude -0.032613 -0.499081 1.000000 0.368819 0.002606
Longitude -0.008621 0.385934 0.368819 1.000000 0.005605
Request_Closing_Time 0.004666 -0.002069 0.002606 0.005605 1.000000
In [50]:
# Plot correlation heatmap
sns.heatmap(corr_matrix, annot=True, fmt=".3f", cmap="coolwarm")
plt.title("Correlation Matrix")
plt.show()
No description has been provided for this image

9. Four major insights through visualization that you come up after data mining.¶

*INSIGHT 1: Spatial Density by Complaint Types*

In [53]:
# Spatial Density by Complaint Types
# List of selected complaint types
complaint_types = list(df['Complaint Type'].unique())

# Loop to create a hexbin plot for each complaint type
for complaint in complaint_types:
    subset = df[df['Complaint Type'] == complaint]
    # Plot hexbar with appropriate configurations
    plt.figure(figsize=(12, 10))
    plt.hexbin(
        subset['Longitude'], 
        subset['Latitude'], 
        gridsize=150, 
        cmap='viridis', 
        bins='log', 
        mincnt=1
    )
    plt.colorbar(label='Log(Count)')
    plt.title(f'Spatial Density of: {complaint}', fontsize=16)
    plt.xlabel('Longitude')
    plt.ylabel('Latitude')
    plt.grid(True)
    # Fix vertical stretching
    plt.gca().set_aspect('equal', adjustable='box')
    plt.tight_layout()
    plt.show()
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image

*INSIGHT 2: Complaint Distribution by Borough*

In [55]:
# Storing borough counts
borough_counts = df["Borough"].value_counts()

# Plot pie chart with appropriate configurations
plt.figure(figsize=(7, 7))
plt.pie(borough_counts, labels=borough_counts.index, autopct='%1.1f%%', startangle=140)
plt.title("Insight 2: Complaint Distribution by Borough")
plt.tight_layout()
plt.show()
No description has been provided for this image

*Insight 3: Requests by Hour of Day*

In [57]:
# Created hour of the day
df['Created Hour'] = df['Created Date'].dt.hour

# Histogram of requests by hour
sns.histplot(df['Created Hour'], bins=24, color='purple')
plt.title("Insight 3: Requests by Hour of Day")
plt.xlabel("Hour")
plt.ylabel("Number of Requests")
plt.xticks(range(24))
plt.show()
No description has been provided for this image

*Insight 4: Number of Complaints*

In [59]:
# Number of complaints of each complaint type
complaints = df["Complaint Type"].value_counts()

# Plot bar graph of most common complaint types
plt.figure(figsize=(12, 6))
sns.barplot(x=complaints.values, y=complaints.index, palette="flare")
plt.title("Insight 4: Most Common Complaint Types")
plt.xlabel("Number of Complaints")
plt.ylabel("Complaint Type")
plt.tight_layout()
plt.show()
C:\Users\Anshu\AppData\Local\Temp\ipykernel_12536\1526219640.py:6: FutureWarning: 

Passing `palette` without assigning `hue` is deprecated and will be removed in v0.14.0. Assign the `y` variable to `hue` and set `legend=False` for the same effect.

  sns.barplot(x=complaints.values, y=complaints.index, palette="flare")
No description has been provided for this image

10. Arranging the complaint types according to their average 'Request_Closing_Time', categorized by various locations.¶

In [64]:
# Create 'Request_Closing_Time' in hours
df = df[df['Request_Closing_Time_In_Hours'] >= 0]

# Group by Complaint Type and Borough and calculate mean closing time
grouped_df = df.groupby(['Complaint Type', 'Borough'])['Request_Closing_Time_In_Hours'].mean().reset_index()

# Pivot the data for visualization
pivot_df = grouped_df.pivot(index='Complaint Type', columns='Borough', values='Request_Closing_Time_In_Hours')

# Select top 10 complaint types for better readability
top_complaints = df['Complaint Type'].value_counts().index
pivot_df = pivot_df.loc[pivot_df.index.intersection(top_complaints)]

# Plotting grouped bar chart
pivot_df.plot(kind='bar', figsize=(14, 7), colormap='tab10')
plt.title('Average Request Closing Time by Complaint Type and Borough')
plt.ylabel('Avg Request Closing Time (Hours)')
plt.xlabel('Complaint Type')
plt.xticks(rotation=45, ha='right')
plt.legend(title='Borough')
plt.tight_layout()
plt.grid(axis='y', linestyle='--', alpha=0.7)
plt.show()
No description has been provided for this image
In [66]:
import plotly.express as px

# Group by Complaint Type and Borough and calculate mean closing time
grouped_df = df.groupby(['Complaint Type', 'Location Type'])['Request_Closing_Time_In_Hours'].mean().reset_index()

# Create a treemap to visualize average request closing times
# broken down by Location Type and Complaint Type
fig = px.treemap(
    grouped_df,
    path=['Location Type', 'Complaint Type'],  # Hierarchical path for grouping
    values='Request_Closing_Time_In_Hours',   # Size of each block based on avg closing time
    color='Request_Closing_Time_In_Hours',    # Color intensity based on avg closing time
    color_continuous_scale='Blues',           # Use a blue color gradient
    title='Avg Request Closing Time by Location Type & Complaint Type'
)

# Customize the figure layout to increase visual clarity
fig.update_layout(
    width=1000,    # Set figure width 
    height=700,    # Set figure height
)

# Display the interactive treemap
fig.show()

11. Statistical Tests¶

*Test 1: Average response time across complaint types is similar or not.*

In [68]:
from scipy.stats import f_oneway

# Use top 10 complaint types for reliability
top_types = df['Complaint Type'].value_counts().index
filtered_df = df[df['Complaint Type'].isin(top_types)]

# Group by complaint type and prepare numeric durations
groups = [
    group['Request_Closing_Time_In_Hours'].dropna()
    for name, group in filtered_df.groupby('Complaint Type')
]

# Perform ANOVA test
f_stat, p_value = f_oneway(*groups)
print("ANOVA F-statistics:", f_stat)
print("ANOVA p-value:", p_value)
if p_value < 0.05:
    print("Reject Ho: Complaint Type and Average Response Time are associated.")
else:
    print("Fail to reject Ho: No association.")
ANOVA F-statistics: 50.06668185699929
ANOVA p-value: 2.462779063124107e-140
Reject Ho: Complaint Type and Average Response Time are associated.

*Test 2: Type of complaint or service requested and location are related*

In [70]:
from scipy.stats import chi2_contingency

# Create a contingency table
contingency_table = pd.crosstab(df['Complaint Type'], df['Borough'])
contingency_table
Out[70]:
Borough BRONX BROOKLYN MANHATTAN QUEENS STATEN ISLAND
Complaint Type
Animal Abuse 1386 2362 1500 1843 544
Blocked Driveway 12540 27830 2036 31179 2110
Derelict Vehicle 1916 5096 527 7959 1744
Disorderly Youth 59 71 66 59 21
Drinking 181 255 290 348 174
Graffiti 8 42 22 36 2
Illegal Parking 7689 27076 11902 21587 4807
Noise - Commercial 2394 11306 14286 5986 665
Noise - House of Worship 75 334 188 297 17
Noise - Park 512 1512 1152 626 67
Noise - Street/Sidewalk 8657 13109 20143 4326 793
Noise - Vehicle 3298 5081 5332 2565 350
Posting Advertisement 16 45 41 30 500
Traffic 344 1081 1523 1283 196
Vending 375 512 2373 469 25
In [71]:
# Perform chi-square test
chi2_stat, p_value, dof, expected = chi2_contingency(contingency_table)
print("Chi-square Statistic:", chi2_stat)
print("Chi-square p-value:", p_value)
if p_value < 0.05:
    print("Reject Ho: Complaint Type and Borough are associated.")
else:
    print("Fail to reject Ho: No association.")
Chi-square Statistic: 72029.76460562609
Chi-square p-value: 0.0
Reject Ho: Complaint Type and Borough are associated.

---THE END---

In [ ]: